AAAI.2019 - Human-AI Collaboration | Cool Papers

#1 One-Network Adversarial Fairness [PDF] [Copy] [Kimi]

Authors: Tameem Adel ; Isabel Valera ; Zoubin Ghahramani ; Adrian Weller

There is currently a great expansion of the impact of machine learning algorithms on our lives, prompting the need for objectives other than pure performance, including fairness. Fairness here means that the outcome of an automated decisionmaking system should not discriminate between subgroups characterized by sensitive attributes such as gender or race. Given any existing differentiable classifier, we make only slight adjustments to the architecture including adding a new hidden layer, in order to enable the concurrent adversarial optimization for fairness and accuracy. Our framework provides one way to quantify the tradeoff between fairness and accuracy, while also leading to strong empirical performance.

#2 Making Money from What You Know - How to Sell Information? [PDF] [Copy] [Kimi]

Authors: Shani Alkoby ; Zihe Wang ; David Sarne ; Pingzhong Tang

Information plays a key role in many decision situations. The rapid advancement in communication technologies makes information providers more accessible, and various information providing platforms can be found nowadays, most of which are strategic in the sense that their goal is to maximize the providers’ expected profit. In this paper, we consider the common problem of a strategic information provider offering prospective buyers information which can disambiguate uncertainties the buyers have, which can be valuable for their decision making. Unlike prior work, we do not limit the information provider’s strategy to price setting but rather enable her flexibility over the way information is sold, specifically enabling querying about specific outcomes and the elimination of a subset of non-true world states alongside the traditional approach of disclosing the true world state. We prove that for the case where the buyer is self-interested (and the information provider does not know the true world state beforehand) all three methods (i.e., disclosing the true worldstate value, offering to check a specific value, and eliminating a random value) are equivalent, yielding the same expected profit to the information provider. For the case where buyers are human subjects, using an extensive set of experiments we show that the methods result in substantially different outcomes. Furthermore, using standard machine learning techniques the information provider can rather accurately predict the performance of the different methods for new problem settings, hence substantially increase profit.

#3 Updates in Human-AI Teams: Understanding and Addressing the Performance/Compatibility Tradeoff [PDF] [Copy] [Kimi]

Authors: Gagan Bansal ; Besmira Nushi ; Ece Kamar ; Daniel S. Weld ; Walter S. Lasecki ; Eric Horvitz

AI systems are being deployed to support human decision making in high-stakes domains such as healthcare and criminal justice. In many cases, the human and AI form a team, in which the human makes decisions after reviewing the AI’s inferences. A successful partnership requires that the human develops insights into the performance of the AI system, including its failures. We study the influence of updates to an AI system in this setting. While updates can increase the AI’s predictive performance, they may also lead to behavioral changes that are at odds with the user’s prior experiences and confidence in the AI’s inferences. We show that updates that increase AI performance may actually hurt team performance. We introduce the notion of the compatibility of an AI update with prior user experience and present methods for studying the role of compatibility in human-AI teams. Empirical results on three high-stakes classification tasks show that current machine learning algorithms do not produce compatible updates. We propose a re-training objective to improve the compatibility of an update by penalizing new errors. The objective offers full leverage of the performance/compatibility tradeoff across different datasets, enabling more compatible yet accurate updates.

#4 Human-in-the-Loop Feature Selection [PDF] [Copy] [Kimi]

Authors: Alvaro H. C. Correia ; Freddy Lecue

Feature selection is a crucial step in the conception of Machine Learning models, which is often performed via datadriven approaches that overlook the possibility of tapping into the human decision-making of the model’s designers and users. We present a human-in-the-loop framework that interacts with domain experts by collecting their feedback regarding the variables (of few samples) they evaluate as the most relevant for the task at hand. Such information can be modeled via Reinforcement Learning to derive a per-example feature selection method that tries to minimize the model’s loss function by focusing on the most pertinent variables from a human perspective. We report results on a proof-of-concept image classification dataset and on a real-world risk classification task in which the model successfully incorporated feedback from experts to improve its accuracy.

#5 Verifying Robustness of Gradient Boosted Models [PDF] [Copy] [Kimi]

Authors: Gil Einziger ; Maayan Goldstein ; Yaniv Sa’ar ; Itai Segall

Gradient boosted models are a fundamental machine learning technique. Robustness to small perturbations of the input is an important quality measure for machine learning models, but the literature lacks a method to prove the robustness of gradient boosted models. This work introduces VERIGB, a tool for quantifying the robustness of gradient boosted models. VERIGB encodes the model and the robustness property as an SMT formula, which enables state of the art verification tools to prove the model’s robustness. We extensively evaluate VERIGB on publicly available datasets and demonstrate a capability for verifying large models. Finally, we show that some model configurations tend to be inherently more robust than others.

#6 Counterfactual Randomization: Rescuing Experimental Studies from Obscured Confounding [PDF] [Copy] [Kimi]

Authors: Andrew Forney ; Elias Bareinboim

Randomized clinical trials (RCTs) like those conducted by the FDA provide medical practitioners with average effects of treatments, and are generally more desirable than observational studies due to their control of unobserved confounders (UCs), viz., latent factors that influence both treatment and recovery. However, recent results from causal inference have shown that randomization results in a subsequent loss of information about the UCs, which may impede treatment efficacy if left uncontrolled in practice (Bareinboim, Forney, and Pearl 2015). Our paper presents a novel experimental design that can be noninvasively layered atop past and future RCTs to not only expose the presence of UCs in a system, but also reveal patient- and practitioner-specific treatment effects in order to improve decision-making. Applications are given to personalized medicine, second opinions in diagnosis, and employing offline results in online recommender systems.

#7 Efficiently Combining Human Demonstrations and Interventions for Safe Training of Autonomous Systems in Real-Time [PDF] [Copy] [Kimi]

Authors: Vinicius G. Goecks ; Gregory M. Gremillion ; Vernon J. Lawhern ; John Valasek ; Nicholas R. Waytowich

This paper investigates how to utilize different forms of human interaction to safely train autonomous systems in realtime by learning from both human demonstrations and interventions. We implement two components of the Cycle-of Learning for Autonomous Systems, which is our framework for combining multiple modalities of human interaction. The current effort employs human demonstrations to teach a desired behavior via imitation learning, then leverages intervention data to correct for undesired behaviors produced by the imitation learner to teach novel tasks to an autonomous agent safely, after only minutes of training. We demonstrate this method in an autonomous perching task using a quadrotor with continuous roll, pitch, yaw, and throttle commands and imagery captured from a downward-facing camera in a high-fidelity simulated environment. Our method improves task completion performance for the same amount of human interaction when compared to learning from demonstrations alone, while also requiring on average 32% less data to achieve that performance. This provides evidence that combining multiple modes of human interaction can increase both the training speed and overall performance of policies for autonomous systems.

#8 Task Transfer by Preference-Based Cost Learning [PDF] [Copy] [Kimi]

Authors: Mingxuan Jing ; Xiaojian Ma ; Wenbing Huang ; Fuchun Sun ; Huaping Liu

The goal of task transfer in reinforcement learning is migrating the action policy of an agent to the target task from the source task. Given their successes on robotic action planning, current methods mostly rely on two requirements: exactlyrelevant expert demonstrations or the explicitly-coded cost function on target task, both of which, however, are inconvenient to obtain in practice. In this paper, we relax these two strong conditions by developing a novel task transfer framework where the expert preference is applied as a guidance. In particular, we alternate the following two steps: Firstly, letting experts apply pre-defined preference rules to select related expert demonstrates for the target task. Secondly, based on the selection result, we learn the target cost function and trajectory distribution simultaneously via enhanced Adversarial MaxEnt IRL and generate more trajectories by the learned target distribution for the next preference selection. The theoretical analysis on the distribution learning and convergence of the proposed algorithm are provided. Extensive simulations on several benchmarks have been conducted for further verifying the effectiveness of the proposed method.

#9 A Unified Framework for Planning in Adversarial and Cooperative Environments [PDF] [Copy] [Kimi]

Authors: Anagha Kulkarni ; Siddharth Srivastava ; Subbarao Kambhampati

Users of AI systems may rely upon them to produce plans for achieving desired objectives. Such AI systems should be able to compute obfuscated plans whose execution in adversarial situations protects privacy, as well as legible plans which are easy for team members to understand in cooperative situations. We develop a unified framework that addresses these dual problems by computing plans with a desired level of comprehensibility from the point of view of a partially informed observer. For adversarial settings, our approach produces obfuscated plans with observations that are consistent with at least k goals from a set of decoy goals. By slightly varying our framework, we present an approach for producing legible plans in cooperative settings such that the observation sequence projected by the plan is consistent with at most j goals from a set of confounding goals. In addition, we show how the observability of the observer can be controlled to either obfuscate or convey the actions in a plan when the goal is known to the observer. We present theoretical results on the complexity analysis of our approach. We also present an empirical evaluation to show the feasibility and usefulness of our approaches using IPC domains.

#10 RGBD Based Gaze Estimation via Multi-Task CNN [PDF] [Copy] [Kimi]

Authors: Dongze Lian ; Ziheng Zhang ; Weixin Luo ; Lina Hu ; Minye Wu ; Zechao Li ; Jingyi Yu ; Shenghua Gao

This paper tackles RGBD based gaze estimation with Convolutional Neural Networks (CNNs). Specifically, we propose to decompose gaze point estimation into eyeball pose, head pose, and 3D eye position estimation. Compared with RGB image-based gaze tracking, having depth modality helps to facilitate head pose estimation and 3D eye position estimation. The captured depth image, however, usually contains noise and black holes which noticeably hamper gaze tracking. Thus we propose a CNN-based multi-task learning framework to simultaneously refine depth images and predict gaze points. We utilize a generator network for depth image generation with a Generative Neural Network (GAN), where the generator network is partially shared by both the gaze tracking network and GAN-based depth synthesizing. By optimizing the whole network simultaneously, depth image synthesis improves gaze point estimation and vice versa. Since the only existing RGBD dataset (EYEDIAP) is too small, we build a large-scale RGBD gaze tracking dataset for performance evaluation. As far as we know, it is the largest RGBD gaze dataset in terms of the number of participants. Comprehensive experiments demonstrate that our method outperforms existing methods by a large margin on both our dataset and the EYEDIAP dataset.

#11 Deep Neural Networks Constrained by Decision Rules [PDF] [Copy] [Kimi]

Authors: Yuzuru Okajima ; Kunihiko Sadamasa

Deep neural networks achieve high predictive accuracy by learning latent representations of complex data. However, the reasoning behind their decisions is difficult for humans to understand. On the other hand, rule-based approaches are able to justify the decisions by showing the decision rules leading to them, but they have relatively low accuracy. To improve the interpretability of neural networks, several techniques provide post-hoc explanations of decisions made by neural networks, but they cannot guarantee that the decisions are always explained in a simple form like decision rules because their explanations are generated after the decisions are made by neural networks. In this paper, to balance the accuracy of neural networks and the interpretability of decision rules, we propose a hybrid technique called rule-constrained networks, namely, neural networks that make decisions by selecting decision rules from a given ruleset. Because the networks are forced to make decisions based on decision rules, it is guaranteed that every decision is supported by a decision rule. Furthermore, we propose a technique to jointly optimize the neural network and the ruleset from which the network select rules. The log likelihood of correct classifications is maximized under a model with hyper parameters about the ruleset size and the prior probabilities of rules being selected. This feature makes it possible to limit the ruleset size or prioritize human-made rules over automatically acquired rules for promoting the interpretability of the output. Experiments on datasets of time-series and sentiment classification showed rule-constrained networks achieved accuracy as high as that achieved by original neural networks and significantly higher than that achieved by existing rule-based models, while presenting decision rules supporting the decisions.

#12 Geometry-Aware Face Completion and Editing [PDF] [Copy] [Kimi]

Authors: Linsen Song ; Jie Cao ; Lingxiao Song ; Yibo Hu ; Ran He

Face completion is a challenging generation task because it requires generating visually pleasing new pixels that are semantically consistent with the unmasked face region. This paper proposes a geometry-aware Face Completion and Editing NETwork (FCENet) by systematically studying facial geometry from the unmasked region. Firstly, a facial geometry estimator is learned to estimate facial landmark heatmaps and parsing maps from the unmasked face image. Then, an encoder-decoder structure generator serves to complete a face image and disentangle its mask areas conditioned on both the masked face image and the estimated facial geometry images. Besides, since low-rank property exists in manually labeled masks, a low-rank regularization term is imposed on the disentangled masks, enforcing our completion network to manage occlusion area with various shape and size. Furthermore, our network can generate diverse results from the same masked input by modifying estimated facial geometry, which provides a flexible mean to edit the completed face appearance. Extensive experimental results qualitatively and quantitatively demonstrate that our network is able to generate visually pleasing face completion results and edit face attributes as well.

#13 Generation of Policy-Level Explanations for Reinforcement Learning [PDF] [Copy] [Kimi]

Authors: Nicholay Topin ; Manuela Veloso

Though reinforcement learning has greatly benefited from the incorporation of neural networks, the inability to verify the correctness of such systems limits their use. Current work in explainable deep learning focuses on explaining only a single decision in terms of input features, making it unsuitable for explaining a sequence of decisions. To address this need, we introduce Abstracted Policy Graphs, which are Markov chains of abstract states. This representation concisely summarizes a policy so that individual decisions can be explained in the context of expected future transitions. Additionally, we propose a method to generate these Abstracted Policy Graphs for deterministic policies given a learned value function and a set of observed transitions, potentially off-policy transitions used during training. Since no restrictions are placed on how the value function is generated, our method is compatible with many existing reinforcement learning methods. We prove that the worst-case time complexity of our method is quadratic in the number of features and linear in the number of provided transitions, O(|F|2|tr samples|). By applying our method to a family of domains, we show that our method scales well in practice and produces Abstracted Policy Graphs which reliably capture relationships within these domains.

#14 Learning Models of Sequential Decision-Making with Partial Specification of Agent Behavior [PDF] [Copy] [Kimi]

Authors: Vaibhav V. Unhelkar ; Julie A. Shah

Artificial agents that interact with other (human or artificial) agents require models in order to reason about those other agents’ behavior. In addition to the predictive utility of these models, maintaining a model that is aligned with an agent’s true generative model of behavior is critical for effective human-agent interaction. In applications wherein observations and partial specification of the agent’s behavior are available, achieving model alignment is challenging for a variety of reasons. For one, the agent’s decision factors are often not completely known; further, prior approaches that rely upon observations of agents’ behavior alone can fail to recover the true model, since multiple models can explain observed behavior equally well. To achieve better model alignment, we provide a novel approach capable of learning aligned models that conform to partial knowledge of the agent’s behavior. Central to our approach are a factored model of behavior (AMM), along with Bayesian nonparametric priors, and an inference approach capable of incorporating partial specifications as constraints for model learning. We evaluate our approach in experiments and demonstrate improvements in metrics of model alignment.

#15 Augmenting Markov Decision Processes with Advising [PDF] [Copy] [Kimi]

Authors: Loïs Vanhée ; Laurent Jeanpierre ; Abdel-Illah Mouaddib

This paper introduces Advice-MDPs, an expansion of Markov Decision Processes for generating policies that take into consideration advising on the desirability, undesirability, and prohibition of certain states and actions. AdviceMDPs enable the design of designing semi-autonomous systems (systems that require operator support for at least handling certain situations) that can efficiently handle unexpected complex environments. Operators, through advising, can augment the planning model for covering unexpected real-world irregularities. This advising can swiftly augment the degree of autonomy of the system, so it can work without subsequent human intervention. This paper details the Advice-MDP formalism, a fast AdviceMDP resolution algorithm, and its applicability for real-world tasks, via the design of a professional-class semi-autonomous robot system ready to be deployed in a wide range of unexpected environments and capable of efficiently integrating operator advising.

#16 FLEX: Faithful Linguistic Explanations for Neural Net Based Model Decisions [PDF] [Copy] [Kimi]

Authors: Sandareka Wickramanayake ; Wynne Hsu ; Mong Li Lee

Explaining the decisions of a Deep Learning Network is imperative to safeguard end-user trust. Such explanations must be intuitive, descriptive, and faithfully explain why a model makes its decisions. In this work, we propose a framework called FLEX (Faithful Linguistic EXplanations) that generates post-hoc linguistic justifications to rationalize the decision of a Convolutional Neural Network. FLEX explains a model’s decision in terms of features that are responsible for the decision. We derive a novel way to associate such features to words, and introduce a new decision-relevance metric that measures the faithfulness of an explanation to a model’s reasoning. Experiment results on two benchmark datasets demonstrate that the proposed framework can generate discriminative and faithful explanations compared to state-of-the-art explanation generators. We also show how FLEX can generate explanations for images of unseen classes as well as automatically annotate objects in images.

#17 Interactive Semantic Parsing for If-Then Recipes via Hierarchical Reinforcement Learning [PDF] [Copy] [Kimi]

Authors: Ziyu Yao ; Xiujun Li ; Jianfeng Gao ; Brian Sadler ; Huan Sun

Given a text description, most existing semantic parsers synthesize a program in one shot. However, it is quite challenging to produce a correct program solely based on the description, which in reality is often ambiguous or incomplete. In this paper, we investigate interactive semantic parsing, where the agent can ask the user clarification questions to resolve ambiguities via a multi-turn dialogue, on an important type of programs called “If-Then recipes.” We develop a hierarchical reinforcement learning (HRL) based agent that significantly improves the parsing performance with minimal questions to the user. Results under both simulation and human evaluation show that our agent substantially outperforms non-interactive semantic parsers and rule-based agents.1